Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT
نویسندگان
چکیده
We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morphemelevel pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation model. Thus, the use of subwords as translation units coupled with multiple related pivot languages can compensate for the lack of a direct parallel corpus.
منابع مشابه
Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages
We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morphemelevel pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is us...
متن کاملDialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The ad...
متن کاملStatistical Machine Translation between Related Languages
Languageindependent Statistical Machine Translation (SMT) has proven to be very challenging. The diversity of languages makes high accuracy difficult and requires substantial parallel corpus as well as linguistic resources (parsers, morph analyzers, etc.). An interesting observation is that a large chunk of machine translation (MT) requirements involve related languages. They are either : (i) ...
متن کاملLocal lexical adaptation in Machine Translation through triangulation: SMT helping SMT
We present a framework where auxiliary MT systems are used to provide lexical predictions to a main SMT system. In this work, predictions are obtained by means of pivoting via auxiliary languages, and introduced into the main SMT system in the form of a low order language model, which is estimated on a sentenceby-sentence basis. The linear combination of models implemented by the decoder is thu...
متن کاملLanguage Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features...
متن کامل